The Boltzmann/Shannon entropy 
as a measure of correlation 
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y—^ . Abstract 
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I It is demonstrated that the entropy of statistical mechanics and 

■ of information theory, ^(p) = —"^pilogpi may be viewed as a mea- 
, sure of correlation. Given a probability distribution on two discrete 

variables, pij, we define the correlation-destroying transformation C : 
Pij — > TTij, which creates a new distribution on those same variables in 
. which no correlation exists between the variables, i.e. vTjj = PiQj. It is 

then shown that the entropy obeys the relation S{p) < S^n) = S(P) + 
S'(Q), i.e. the entropy is non-decreasing under these correlation- 
destroying transformations. 

X 

• The concept of correlation has underlain statistical mechanics from its 

inception. Maxwell derived his velocity distribution law [| by asserting that 
such a distribution $({7) for an ideal gas should obey two properties: (1) the 
velocity distribution along each axis should be uncorrelated, i.e. 

^{v)(fv = {(j){v^)dv^){(j){vy)dvy){(l){v^)dv^) 

and (2) the velocity distribution should show no preferred orientation, (^{v)d^v 
f{v)d^v, where v is the norm of v. He showed these two assumptions led to 
the velocity distribution ^{v)d^v = exp{—av'^)d^v, where a is a positive 

^Maxwell, J.C., Phil. Soc, 1860 
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constant (later shown by Boltzmann to be equal to gff^)- (This reduces 
to the more familiar form by writing this expression in polar coordinates, 
^{vr,vg,v^)d^v = exp{—av'^)v'^dvrdvgdvtj)). 

Inspired by the observations of Ochs 0, we would like to show that the 
Boltzmann/Shannon formula for entropy may be viewed as a measure of cor- 
relation, by showing that for a class of transformations which destroys corre- 
lations between variables in a probability distribution, the Boltzmann/Shannon 
formula for the entropy is non- decreasing. 

Suppose that we have a set of states, {Xij}, indexed by their values 
along two distinct, discrete variables A and B, where i runs over the set of n 
discrete states of A, and j runs over the m set of discrete states of B. Consider 
a probability distribution over these states, pij. If A and B are uncorrelated, 
there exist some Pi and Qj such that Pij = PiQjWi,j. In this case, it is 
apparent that the entropy obeys the property S(p) = S(P) + S(Q). 

However, in general, this will not be true. But, given an arbitrary p^j, we 
can define the following transformation, which in effect destroys the correla- 
tions between its dependence on A and on B: 

C : Pij TTij (1) 

TTij = PiQ, (2) 

m 

Pi = EP^J (3) 

i=i 

n 

Qi = Hph (4) 

1=1 

We assert that the entropy is non- decreasing under such transformations, 

i.e. 

S{p) < Sin) (5) 

To demonstrate this assertion, we need first to demonstrate a fundamental 
property of the Boltzmann/Shannon entropy formula, the averaging property. 
Given a set of states {yj^^, and two probability distributions defined over 
these states, G = {gk}k=i and H = {hk}k=i, one may construct a third 
distribution, U = {uk},Uk = ^^^(afi'fc + Phk), the weighted average of G and 

^Ochs, W., Rep. Math. Phys., 9, 135 (1976) 
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H, where a and p are arbitrary real values. We assert that 

SiU)>^{aSiG) + /3S{H)) (6) 
a + p 

This assertion can be demonstrated by observing that the similar inequality 
holds term-by-term in the sum. Defining cr(x) = —xlnx, the averaging 
property will hold if 

o-(wfc) > -^—r{aa{gk) + (3a{hk)) (7) 
a + p 

This property of follows from it being concave everywhere over the 

domain of interest, x G [0, 1], i.e. a"{x) < 0. 

Note, too, that a consequence of the averaging property of entropy is that, 
given a set of different distributions over {Y}^^^, Z^, Z^, Z^, . . . Z^, and a 
set of weights Wi, Wi = 1, that the entropy of Z = J2k "i^kZ^, the weighted 
average of all these distributions, obeys the following property: 

SiZ)>J2wkSiZ') (8) 

k 

Returning to the fundamental assertion, equation ^, this may be demon- 
strated by recalling a fundamental property of the Boltzmann/Shannon en- 
tropy formula, one which Shannon ^ took not as a derived property but 
rather an axiomatic property that an entropy functional must have: If we 
decompose the distribution pij into a two stages, where initially we distribute 
among the states over A by the distribution Pj, and next we distribute among 
the states of B by the distribution such that pij = PiC,f\ the entropy 
obeys the formula 

S{p) = S{P) + Y.P.S{C^'^) (9) 

i 

In the general case, each of the distributions C-*-* will be different, i.e. the 
variables A and B are correlated. The distribution Qj above represents a 
weighted average of the C^*^'s, weighted by Pj, i.e. 

Q. = E^4'^ (10) 



■^C. Shannon and W. Weaver, The Mathematical Theory of Communication,\jT^haTia: 
Univ. of 111. Press, 1949 
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Hence, the Shannon axiom with the averaging property of the entropy leads 
to the desired assertion: 

Sip) = siP) + Y.PkSic^'^) (11) 

k 

Sip) < SiP) + SiQ) (12) 
Sip) < Siir) (13) 

Jaynes showed how Shannon's theory could be merged with statistical 
mechanics, leading to the conceptualization of the thermodynamic principle 
of maximum entropy as a principle expressing that the distribution of energy 
among the microstates should be that distribution which is least-biased, given 
the constraint of a specified temperature. 

Viewing entropy —kJ^kPi^^Pi ^ measure of (lack of) correlation pro- 
vides a new twist to Jaynes' perspective. One may say that the equilib- 
rium distribution is that distribution which is least-correlated given the con- 
straint (s). The Second Law of Thermodynamics may be rephrased to state 
that correlations are highly unlikely to arise spontaneously, and that the 
natural course of evolution of a system is one in which correlations diminish. 

Thinking about entropy as a measure of correlation leads to a key im- 
plicit assumption in both Boltzmann's theory and Shannon's theory: the 
individual (micro) states are assumed to be uncorrelated. This hails back to 
Laplace's balls-in-urns, where the probability of finding a ball in a given urn 
is generally uncorrelated with the probability of finding a ball in a different 
urn. If the probability of occupation of the states are intrinsically correlated, 
the maximum entropy distribution cannot be viewed as the correct, least- 
biased distribution. In communication theory and statistical mechanics, this 
assumption may in certain circumstances be valid, but where this assump- 
tion breaks down severely is the case when we attempt to take the limit to 
a continuous set of states. If one constructs a discrete set of bins from an 
intrinsically continuous variable, the degree of bin-bin correlation grows as 
these bins become steadily finer. This leads to the question 'what is the 
correct measure of entropy for a continuous distribution?'. 

Dill Q points out that these issues of correlation and additivity pervade our 
thinking about the fundamental aspects of chemical and biological phenom- 

^Jaynes, Phys. Rev., 106, 620 (1957) 
^DiU, K.A.,J. Biol.Chem., 272, 701 (1997) 
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ena. He highlights some of the pitfalls which one may encounter in settings 
for which an assumption of non-correlation may not be valid. 



